A prediction model based on deep learning and radiomics features of DWI for the assessment of microsatellite instability in endometrial cancer

Abstract Background To explore the efficacy of a prediction model based on diffusion‐weighted imaging (DWI) features extracted from deep learning (DL) and radiomics combined with clinical parameters and apparent diffusion coefficient (ADC) values to identify microsatellite instability (MSI) in endometrial cancer (EC). Methods This study included a cohort of 116 patients with EC, who were subsequently divided into training (n = 81) and test (n = 35) sets. From DWI, conventional radiomics features and convolutional neural network‐based DL features were extracted. Random forest (RF) and logistic regression were adopted as classifiers. DL features, radiomics features, clinical variables, ADC values, and their combinations were applied to establish DL, radiomics, clinical, ADC, and combined models, respectively. The predictive performance was evaluated through the area under the receiver operating characteristic curve (AUC), total integrated discrimination index (IDI), net reclassification index (NRI), calibration curves, and decision curve analysis (DCA). Results The optimal predictive model, based on an RF classifier, comprised four DL features, three radiomics features, two clinical variables, and an ADC value. In the training and test sets, this model exhibited AUC values of 0.989 (95% CI: 0.935–1.000) and 0.885 (95% CI: 0.731–0.967), respectively, demonstrating different degrees of improvement compared with the clinical, DL, radiomics, and ADC models (AUC‐training = 0.671, 0.873, 0.833, and 0.814, AUC‐test = 0.685, 0.783, 0.708, and 0.713, respectively). The NRI and IDI analyses revealed that the combined model resulted in improved risk reclassification of the MSI status compared to the clinical, radiomics, DL, and ADC models. The calibration curves and DCA indicated good consistency and clinical utility of this model, respectively. Conclusions The predictive model based on DWI features extracted from DL and radiomics combined with clinical parameters and ADC values could effectively assess the MSI status in EC.

5][6] Conventional testing assays such as immunohistochemistry (IHC), polymerase chain reaction, and next-generation sequencing techniques, are not only labor-intensive but also invasive and expensive for patients. 7,8Therefore, the development of a convenient, economical, and noninvasive method to detect the MSI status of EC is important for the well-being of affected patients.
Diffusion-weighted imaging (DWI) is a quantitative magnetic resonance imaging (MRI) technique that has found widespread application in diagnosing and assessing EC. 9,10 It excels at not only providing detailed morphological information about lesion location and quantity at the macroscopic level but also quantifying the rate of water molecule movement within biological tissues at a microscopic scale. 11he evolution of information technology has paved the way for the development of computer-aided diagnosis systems in the field of medical imaging.Radiomics represents a significant breakthrough in this domain, enabling the discovery and exploitation of higher-order image features that might be challenging for radiologists to discern, thereby enhancing diagnostic accuracy. 12,13Several studies have demonstrated the utility of radiomics in risk stratification, 14 clinical classification, 15 and prognostic prediction 16 for patients with EC.Deep learning (DL) represents a further evolution of radiomics.Compared with conventional radiomics, DL uses multilevel convolutional neural networks (CNN) to extract image information, resulting in highly selective and robust image features that improve the accuracy of disease diagnosis and assessment. 17,18For instance, Chen et al. demonstrated the effectiveness of DL in assessing the depth of myometrial infiltration in patients with EC, 19 while Tao et al. reported the capacity of DL to enhance the accuracy of EC diagnosis. 202][23] Moreover, the integration of DL features has been largely absent from this research endeavor.
Therefore, this study aimed to explore the feasibility of a predictive model based on DWI features extracted from DL and radiomics combined with clinical parameters and the apparent diffusion coefficient (ADC) value in differentiating between microsatellite stability (MSS) and MSI, This endeavor aimed to introduce a novel, noninvasive pretreatment tool for evaluating the MSI status in patients with EC.

| Study participants
Initially, data were retrieved from 276 patients who underwent pelvic MRI for suspected EC between June 2019 and August 2023.The exclusion criteria were as follows: (i) patients with pathologically confirmed non-EC (n = 25); (ii) patients without MSI status assessment (n = 90); (iii) patients with DWI sequences that were absent or of poor quality, making them unsuitable for analysis (n = 17); (iv) patients who underwent radiotherapy, chemotherapy, or surgery before the pelvic MRI examination (n = 13); (v) patients with incomplete clinical or histopathological information (n = 15).Ultimately, 116 patients with EC were included in the study.Clinical variables included age, maximum tumor diameter, and levels of carcinoembryonic antigen (CEA), carbohydrate antigen 125 (CA 125), carbohydrate antigen 199 (CA 199), and carbohydrate antigen 153 (CA 153), which were recorded at baseline.This study was approved by the local ethics committee, and the need for informed consent was waived.

Conclusions:
The predictive model based on DWI features extracted from DL and radiomics combined with clinical parameters and ADC values could effectively assess the MSI status in EC.

K E Y W O R D S
deep learning, diffusion-weighted imaging, endometrial cancer, microsatellite instability, radiomics in the supine position, feet first.The scanning field encompassed the area from the anterior superior iliac spine to the pubic symphysis.Axial T2WI and DWI data were obtained for subsequent analysis.Details are explained in Table 1.

| Tumor segmentation
The ITK-SNAP software (version 3.8.0;http:// www.itksn ap.org) was used for tumor segmentation.First, a radiologist with 6 years of experience (reader 1) manually delineated regions of interest (ROI) for the tumors slice-by-slice on the axial DWI images.Subsequently, a radiologist with 15 years of experience examined the above ROIs and determined the final results.Finally, volumes of interest (VOI) were constructed by integrating the ROIs from all the slices of each tumor using the three-dimensional functionality of the software.The workflow of image processing is presented in Figure 1.

| Calculation of the ADC values
The VOIs for each tumor were mapped to the corresponding ADC maps.Subsequently, the ADC values for each tumor were calculated using the following equation: S b /S 0 = exp (− b × ADC), where "b" represents the diffusion sensitizing factor and "S 0 " and "S b " represent the signal intensities at a b value of 0 or the b value indicated by the subscript, respectively.

| DL features
The DWI data underwent initial preprocessing and enhancement, including conversion of Digital Imaging and Communications in Medicine files to the Neuroimaging Informatics Technology Initiative file format, alignment, resampling to achieve uniformly isotropic resolution (1 mm 3 ), and random axis mirror flipping.Subsequently, a multi-scale CNN-based DL feature extraction model was devised to refine and extract tumor features.This model comprised eight convolutional layers with Rectified Linear Unit activation, three maximal pooling layers, three upsampling layers, and one fully sampled layer.The multi-scale network design comprises five bottom-up layers and three merged layers, wherein the larger scale focuses on the tumor's larger aspects while the smaller scale captures richer contextual information.During each training iteration of the model, the DWI were preprocessed separately and fed into the model.The DL features generated by the model were then obtained through the fully connected layer. 26The above process is shown in Figure 2. All these operations were performed using self-written code (Supplementary 1) in Python (version 3.1.0),resulting in 128 DL features being extracted from each of the DWI.

| Feature selection
On DWI images from 20 randomly selected patients, inter-and intra-observer agreement of features was assessed.Reader 1's repeated segmentation of the VOI on DWI 2 weeks later was used to estimate intra-observer reproducibility.To assess inter-observer reproducibility, reader 2, another radiologist with 17 years of experience, also independently delineated VOIs and extracted features in the same manner.For radiomics and DL features, robust features were first selected by using inter-and intra-observer interclass correlation coefficients >0.75 as a criterion.Then, all features were subjected to normalization using the Z score method.Redundant features were subsequently eliminated using the Mann-Whitney U test and the least absolute shrinkage and selection operator algorithm.

| Model development
The training set was formed by randomly selecting 70% of the patients from the MSS and MSI groups, while the test set was comprised of the remaining 30% of patients.
In the process of developing the model, one preprocessor (min_max_scaler) and two classifiers [logistic regression (LR) and random forest (RF)] were employed.Details of LR: penalty factor, C; category weight, none; penalty factor, I2; classification threshold, 0.5; tolerance 0.0001.Details of RF: class weight, none; maximum depth of tree, 2; criterion method, gini; minimum number of tree leaf, 1; threshold, 0.5.Ten (2 × 5) prediction models were generated, including four independent models (clinical, radiomics, DL, and ADC), and one combined model (clinical + radiomics + DL + ADC).The final prediction model had to meet specific criteria to prevent overfitting and ensure robustness: the area under the curve (AUC) values for the training and test sets should be maximized while ensuring the difference in the AUC between the two sets remained <0.15. 27

| MSI status assessment
The MSI status was assessed by IHC staining of four mismatch repair (MMR) proteins, MMR genes mutL homolog 1, mutS homolog 2, mutS homolog 6, and PMS1 homolog 2. EC tissues without deletion of these MMR proteins were categorized as MSS, while EC tissues with deletion of one or more of these MMR protein expression deficiencies were categorized as MSI. 28The above work was carried out independently by two experienced pathologists, and disagreements, if any, were resolved by negotiation.

| Statistical analysis
Data analyses were performed using SPSS 26.0 (IBM) and RStudio 4.3.1 (R Foundation) software.Differences in variables were analyzed using the Mann-Whitney U-test or the chi-square test.The diagnostic performance of the model was quantified using the area under the receiver operating characteristic (ROC) curve.The DeLong analysis, net reclassification index (NRI), and total integrated discrimination index (IDI) were used to assess the added value of the model.The calibration curve and decision curve analysis (DCA) were employed for model validation and net clinical benefit assessment, respectively.Statistical significance was set at p < 0.05.

| Clinical and ADC data
The  2 and Figure 3).

| Model validation and clinical benefit
In the training and test sets, the calibration curves and DCA revealed that the combined model (clinical + radiomics + DL + ADC) not only exhibited good agreement between predicted values and actual observations but also offered reliable clinical benefits for assessing the MSI status in patients with EC (Figures 5 and 6).

| DISCUSSION
In this study, a combined model was constructed that integrated two clinical variables, three radiomics features, four DL features, and the ADC value to predict the MSI status in patients with EC.Furthermore, compared with the clinical, radiomic, DL, and ADC models, this combined model exhibited improved diagnostic performance in the training and test sets.Clinical variables encompass a range of clinical indicators such as patient age, lesion size, clinical symptoms, and serum tumor markers, among others.Previous studies have shown that predictive models constructed solely using clinical variables often struggle to provide an accurate assessment of the disease state in the patients due to their limited specificity. 29,30This study incorporated a range of variables, including age, tumor size, CEA, CA 125, CA 199, and CA 153.However, after careful consideration, only CEA and CA 199 were included in the final model, which ultimately exhibited low diagnostic efficacy.This outcome is consistent with the aforementioned studies and suggests the limited role of a clinical model in evaluating the MSI status in patients with EC.This study also revealed that CEA levels were higher in the MSI group compared with the MSS group in the training and test sets, with a significant difference observed in the training set.One possible explanation for this finding is that the MSI group tended to exhibit more malignancy, thereby resulting in higher CEA levels compared with the MSS group. 31DC, a derived parameter from DWI, quantifies the rate of diffusion movement of water molecules in biological tissues. 9In the present study, the MSI group exhibited a higher degree of malignancy, a tighter tissue structure, and greater constraint on the water molecule diffusion movement compared with the MSS group.As a result, the ADC values in the MSI group were smaller than those in the MSS group, with diagnostic efficacies of 0.814 and 0.713 in both the training and test sets, respectively.This finding is consistent with previous studies 32,33 and further confirms the role of ADC values in assessing the MSI status in patients with EC.However, the above studies and our current investigation indicate that the diagnostic efficacy of ADC values typically ranges from 0.7 to 0.9, thereby presenting challenges in accurately determining the MSI status of patients with EC.It is speculated that this limitation might be attributed to the fact that ADC offers a single-dimension perspective of lesion information and that there is some overlap in the range of ADC values between the MSS and MSI groups.

F I G U R E 3 The heat map of features in combined model (clinical
In this study, DL and radiomics were used to extract the features of DWI, and relevant models were constructed.Our findings underscored the effectiveness of DL and radiomics models in assessing the MSI status of patients with EC.Notably, the DL model exhibited superior diagnostic efficacy compared with the radiomics model in the training and test sets, a result in line with previous studies. 34,35This consistency highlights the feasible and relative superiority of DL models in distinguishing MSI and MSS in patients with EC.One possible   explanation for the superior performance of DL over radiomics lies in the more complex and scalable structure of CNNs.Unlike radiomics, which typically comprises a relatively fixed set of feature information, CNNs possess the capability to not only extract deeper and more subtle features but also potentially mimic the elusive subconscious process that occurs when radiologists interpret DWI. 36,37Therefore, DL can effectively identify subtle differences between MSS and MSI, resulting in better diagnostic efficacy.Moreover, a combined model comprising clinical variables, DL features, radiomic features, and ADC values was constructed.This model demonstrated improved diagnostic efficacy and reclassification ability to varying degrees when compared with the clinical, DL, radiological, and ADC models.This finding is consistent with previous research, suggesting that a combined model integrating multidimensional validated information could provide a more comprehensive view of the lesion than a single model. 38,39As a result, such a combined model can lead to a more accurate differentiation between MSS and MSI in patients with EC.The choice of algorithm is a decisive factor influencing the diagnostic performance of a prediction model.In this study, the LR and RF algorithms were chosen for model construction due to their simplicity, effectiveness, and widespread use.The findings revealed that LR outperformed RF in terms of diagnostic performance for clinical and ADC models.However, for radiomics, DL, and combined models, RF emerged as the superior algorithm, a result consistent with previous studies. 27,38he observed variations in algorithm performance could largely be attributed to differences in their characteristics.For instance, the RF algorithm is a comprehensive learning method based on bagging, affording it a greater advantage in handling classification and regression challenges. 39In contrast, the LR algorithm operates as a generalized linear regression analytical model, excelling in its ability to accurately predict variable effects. 40dditionally, it has been established that the same algorithm can exhibit significant performance differences when applied to different diseases due to the distinct pathophysiological characteristics inherent to each condition. 38,39,41Therefore, it is becoming increasingly evident that the integrated application of multiple algorithms for model development, allowing for the identification of the most suitable prediction model, might represent the future trend.
This study proposed a method for MSI status assessment in EC patients based on DWI DL and radiomics features with promising assessment performance.However, at present, pathological testing should still be preferred as the gold standard for MSI status assessment in relevant patients when conditions permit.In the future, with the continuous development of AI-related technologies, the assessment methods proposed in this study will be further refined and may serve as a first step that has the potential to serve as an effective supplement in resource-limited situations, thus assisting clinicians to have a more accurate evaluation of a patient's condition.
This study has certain limitations.First, the retrospective nature of the research might limit the value of the predictive model.Second, this was a single-center study, and the cost associated with MSI testing limited the sample size.Third, the results of this study have not been externally validated, which may lead to some bias.Fourth, the DWI scanning protocols in this study were sourced from different vendors and scanners, and although standardized preprocessing was performed, it might have adversely affected the results.Finally, the study relied on two algorithms, LR and RF, for model construction, which might not have fully explored the potential for developing more stable and reliable models.In the future, attempts will be made to further expand the sample size and conduct prospective, multicenter studies, as well as optimize the imaging protocols and incorporate additional algorithms, to achieve clinical dissemination and application of the model.

F I G U R E 1 F I G U R E 2
Workflow of image processing.Schematic of deep learning feature generation.The model consists of eight convolutional layers with ReLU activation (red), three maximal pooling layers (blue), three up-sampling layers (yellow), and a full sampling layer (purple).

F I G U R E 4
The area under the receiver operating characteristic curves of clinical (purple), DL (green), radiomic (yellow), ADC (blue), and combined (clinical + DL + radiomic + ADC, red) models for the training (A) and test (B) sets.

T A B L E 4 | 11 of 13 WANG
The NIR and IDI of different models.F I G U R E 5 The calibration curves of the clinical (purple), DL (yellow), radiomic (blue), ADC (green), and combined (clinical + DL + radiomic + ADC, red) models for the training (A) and test (B) sets.F I G U R E 6 The decision curve analysis curves of the combined (clinical + DL + radiomic + ADC) model for the training (A) and test (B) sets.et al.
training set comprised 81 patients, including 29 MSI and 52 MSS cases.The test set comprised 35 patients, including 13 MSI and 22 MSS cases.No significant differences were observed in clinical and ADC data between the training and test sets.In the training set, the CEA levels in the MSI group exceeded those in the MSS group, although, in the test set, there was no significant difference in CEA levels between the two groups.The ADC values in the MSI group were significantly lower than those in the MSS group, a trend observed consistently in the training and test sets.No significant differences were observed in age, maximum diameter, CA 125, CA 153, and CA 199 between the MSS and MSI groups, whether in the training or test set.The details are summarized in Table 2.